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Abstract. The article defines and studies the genus of finite state de- 
terministic automata (FSA) and regular languages. Indeed, a FSA can 
be seen as a graph for which the notion of genus arises. At the same 
time, a FSA has a semantics via its underlying language. It is then natu- 
ral to make a connection between the languages and the notion of genus. 
After we introduce and justify the the notion of the genus for regular lan- 
guages, the following questions are addressed. First, depending on the 
size of the alphabet, we provide upper and lower bounds on the genus 
of regular languages : we show that under a relatively generic condition 
on the alphabet and the geometry of the automata, the genus grows at 
leeist linearly in terms of the size of the automata. Second, we show that 
the topological cost of the powerset determinization procedure is expo- 
nential. Third, we prove that the notion of minimization is orthogonal 
to the notion of genus. Fourth, we build regular languages of arbitrary 
large genus: the notion of genus defines a proper hierarchy of regular 
languages. 



Beyond the set-theoretic description of graphs, there is the notion of 
an embedding of a graph in a surface. Intuitively speaking, an embedding 
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of a graph in a surface is a drawing without edge- crossings. Planar graphs 
are drawn on the sphere ^o, the graphs and i^3,3 are drawn on the 
torus 5*1 and more generally, any graph can be drawn on some closed 
orientable surface Sk, that is a sphere with k "handles". The genus of a 
graph G is the minimal index k such that G can be drawn on Sk- 




The aim of this work is to explore standard notions of finite state au- 
tomata (FSA) theory with this topological point of view. The novelty of 
this point of view lies in the fact that finite state automata are not only 
graphs, they are machines. These machines compute regular languages. 
The correspondence is onto: one language may be computed by infinitely 
many automata. It is then natural to define the genus of a regular lan- 
guage to be the minimal genus of its representing deterministic automata. 

It should be noted that the word "deterministic" in the previous sen- 
tence is crucial: any regular language is recognized by some planar nonde- 
terministic automaton. The earliest reference for this result we could find 
is [BoCh76] . The cost in terms in extra states and transitions is analyzed 
in |.BP99] . By contrast, we show in this paper the existence of regular 
languages having arbitrary high genus. 

The use of topology in the study of languages may come as a surprise 
at first. We suggest two motivations of very different nature. First, the 
question arises naturally if one wants to build physically the FSA. Think 
of boolean circuits, they also are graph-machines. There is an immense 
literature about their electronic implementation, that is about the layout 
of Very-Large Scale Integration (VLSI) (for instance [CKC83] ). In par- 
ticular, the problem of minimization of via is close to the current one. 
Many contributions suppose a fixed number of layers (holes), but some 
consider an arbitrary one |SHL90| . As we will show, a smaller number of 
states may not necessarily mean a smaller cost in terms of the electronic 
implement ation . 

There is a second and more fundamental reason why one should con- 
sider topology in general and the genus in particular in the study of 
regular languages. Low-dimensional topology is a natural tool in order to 
estimate the complexity of languages (or the complexity of the computa- 
tion of languages). The main invariant of a regular language L is usually 
the number of states (the size) of the minimal automaton recognizing L. 
This invariant describes the size of the table data in which transitions 
are stored, that is the size of the machine's memory. However, simple 
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counting costs memory without complexifying the internal structure of 
automata. As a simple example, the language = {a"} is represented 
by an automaton of size n + 2 but with the simple shape of a line: 



The genus, as a complexity measure, has been introduced for formal 
logical proofs by R. Statman |Sta74] . and further studied by A. Car- 
bone |Car09j . Cut-elimination is presented as a way of diminishing the 
complexity of proofs, that is of simplifying proofs. We are not aware of 
other use of low dimensional topology as a complexity measure besides 
this work. To the best of our knowledge, classical textbooks (e.g. jHMU06] . 
|Sak09j ■ |RS97] ) about automata theory are devoted to the set-theoretic 
approach. Our long term objective is a topological study of the well known 
constructions such as minimization, determinization, union, concatena- 
tion, and so on. This paper is devoted to the notion of genus. 

As a first step, we derive a closed formula for the genus of a determin- 
istic finite automaton (Theorem [s]). Then we show that under a rather 
mild hypotheses on the size of the alphabet (> 4) and on the geometry, 
the genus of a deterministic finite automaton at least increases linearly in 
terms of the number of states (Theorem [g]). Since the hypotheses depend 
only on the abstract representing automaton and not on a particular 
embedding, we deduce an estimation of the genus of regular languages 
(Theorem [7]). 

Theorem 1. Let (Ln)n>i be a sequence of regular languages Ln of size n, 
with alphabet size m > 4. Assume that for any deterministic automaton 
recognizing Ln, the number of cycles of length 1 and 2 is negligible with 
respect to n. For any e > 0, there is N > such that for all n> N , 



We present several remarkable consequences of this result throughout 
this paper. 

We mention two particular cases of interest. It is known that the size 
of the union of two automata increases linearly with the product of their 
respective size. We prove that the genus of the union of two automata A 
and B increases linearly with the product of the sizes of A and B (Corollary 
[4]). We also provide an example of a nondeterministic automaton A such 
that the genus of the powerset-determinized form of a A is exponential up 
to a linear factor with respect to the size of A (Theorem Isl). 
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In a second step, we study further the hnk between languages, their 
representation in terms of automata and their genus. The comparison 
with state minimization is instructive. Myhill-Nerode Theorem ensures 
that two deterministic automata with same minimal number of states 
that recognize the same laguage must be isomorphic. We show that this 
uniqueness property does not hold if we replace minimal number of states 
by minimal genus. There is no simple analog to Myhill-Nerode Theo- 
rem. As a consequence nonisomorphic automata representing the same 
language may have minimal genus. There may be even nonisomorphic 
automata of minimal size within the set of genus-minimal automata. 

As a final step, we describe explicit languages having arbitrary high 
genus (Theorem [9]) . These results imply the existence of a nontrivial hi- 
erarchy of regular languages based on the genus and yields a far-reaching 
generalization of the results of [BoChTG] §4] . In particular, the genus yields 
a nontrivial measure of complexity of regular languages. 

1 Finite State Automata 

We briefly recall the main definitions of the theory of finite state automata 
and regular languages. An alphabet is a (finite) set of letters. A word on 
an alphabet ^ is a finite sequence of letters in the alphabet. Let A* be 
the set of all words on ^, e is the empty word and the concatenation of 
two words w and w' is denoted hy w -w' . We define repetitions as follows. 
Given some word w, let = e and w"^^^ = tf" • w. 

A language on an alphabet A is a subset of A* . Given two languages, 
let L + L' , L- L' and L* denote respectively the union, the catenation and 
the star-operation on L (and L'). Rational languages are those languages 
build from finite sets and the three former operations. 

A (finite state) automaton is a 5-tuple A = {Q,A,qQ,F,5) with Q, a 
finite set of states among which qo is the initial state, F C. Q is the set 
of final states, A is an alphabet and 5CQxAxQ is the transition 
relation. The relation 6 extends to words by setting 6{q, e, q) for all g € Q 
and by defining 5{q, a ■ w, q') if and only if 6{q, a, q") and 5{q" ,w, q') for 
some state q" G Q. Such an automaton induces a language 

C^ = {w£A* \ 6{qo, w, qj) Aqj £ F}. 

The language /^a is said to be recognized (or represented) by A. A funda- 
mental result is Kleene's theorem. 

Theorem 2 (Kleene). A language is regular if and only if it is recog- 
nized by some finite state automaton. 
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Example 1. On the alphabet {a, 6}, let us define the automaton F on the 
left and K5 on the right: 



The small arrows indicate the initial states and final states are doubly 
circled. The language recognized by F is £f = {a" • 6 | n G N} U {(a • h)"^ \ 
n > 0} = a* • 6 + (a • 6)*. The language recognized by K5 is composed 
of the words of "weight" modulo 5. The weight of a being 1 and the 
one of b being 2. 

An automaton A = {Q, A, qo, F, 5) is said to be deterministic (resp. 
complete) if for any state q G Q and any symbol a £ A, the cardinality 
of the set {q' £ Q \ Siq,a,q')} is at most one (resp. at least one). In 
the case when A is deterministic and complete, 5 is actually a function 
QxA^Q.ln that case, 6{q, u) = q' stands for S{q, u, q'). It is well known 
that regular languages restrict to (complete) deterministic automata. All 
deterministic automata in this paper shall be finite and complete unless 
stated otherwise. 

Example 2. The automaton K5 is deterministic, but F is not. Nevertheless, 
Cf is recognized by the automaton F': 



Note that the only function of the state symbolized by _L is to make the 
automaton F' complete. It is traditionally denoted the "trash state" . Once 
this state is reached, the final states are unreachable. 

Given a language L, a distinguishing extension of two words u and v 
is a word w such that u-w £ L and v-w L. Let Rihe the (equivalence) 
relation u Rl v if and only if u and v have no distinguishing extension. 

Theorem 3 (Myhill-Nerode |Myh57pNer58] ) . A language L is reg- 
ular if and only if Rl has finitely many equivalence classes. 




a 




b 
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Actually, the equivalence classes are the states of an automaton-called 
the minimal automaton- which, remarkably, is the smallest deterministic 
automaton recognizing L. By smallest, we mean the one with the minimal 
number of states. Thus, the notion of size of an automaton A, denoted |A| 
in the sequel, is the number of states of A. We emphasized the determinant 
'the' in the first sentence of the paragraph to stress the fact that there is 
only one (up to isomorphism) automaton of minimal size representing L. 

Example 3. The automaton F' is minimal. 

The size of an automaton serves as an evaluation of its complexity (see 
for instance |YuOOj ) . Due to Myhill-Nerode, one may define the complexity 
of a regular language to be the size of its minimal automaton. 

Example 4- There are regular languages of arbitrary large complexity. For 
instance, on the alphabet A = {a}, consider for all n > the language 
Ln = {o"} that consists of all words on A of length n. The linear automa- 
ton depicted in the introduction is the minimal automaton representing 
Ln- it has size n + 2. 

Proposition 1. If two words u andv have a distinguishing extension for 
a regular language L, then, for any deterministic automaton A represent- 
ing L, 5{qo,u) / 6{qo,v). 

Proof. Ad absurdum. Suppose that 5{qo, u) = qi = 5{qo, v). Then, 5{qo, u- 
w) = S{qi, w) = 5{qo, v -w). Since u-w £ L, 6{qi,w) £ F. Thus, v -w £ L, 
in contradiction with the hypothesis. ■ 

2 The genus of a regular language 

Let A be a finite automaton. In the constructions to follow, we regard 
A as a graph where the vertices are the states and the edges are the 
transition^ We simply forget about the extra structures on it (namely, 
the orientation and the labels of the edges). We are interested in a class of 
embeddings of A into oriented surfaces. Recall that a 2-cell is a topological 
two-dimensional disc. An automaton is planar if it embeds into a 2-cell 
(or equivalently a sphere or a plane). 

By means of elementary operations, one can show that A embeds into 
a closed oriented surface U. Among all embeddings that share that prop- 
erty, choose one such that the complement of the image of A in 17 is a 



^ In particular, two vertices may be joined by several edges. 
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disjoint union of a finite number of open 2-cells. Such an embedding will 
be called a cellular embedding. Again by elementary operations, one can 
show that there exists a cellular embedding of A. 

As a very simple example, the automaton A that consists of one state 
and one loop embeds in the obvious fashion into the 2-sphere. Note that 
the geometrical realization of A coincides with the loop. 
The embedding is cellular because the complement of 
the loop is the union of two 2-cells. The same automaton 
embeds also into the torus T as depicted. In this case the 
embedding is not cellular because T \ A is a cylinder and not a disjoint 
union of 2-cells. 

Example 5. Another example is given by an automaton with one state 
and two loops. Of the two embeddings depicted into the torus T, the top 
one is noncellular and the bottom one is cellular. (One should identify 
the opposite sides of the square on the left side to obtain the embedding 
depicted on the right side.) 



In this context, the following observation is a tautology: 

Lemma 1. A cellular embedding of an automaton A C X! determines a 
finite CW-complex decomposition of the surface U in which the 1-skeleton 
of E is the image of A. 

A CW-complex is a topological space made up of /c-dimensional cells. 
Here we use 0-cells (points, corresponding to states), 1-cells (topological 
segments, corresponding to transitions) and 2-cells (topological discs). For 
the precise definition of a CW-complex decomposition, see for instance 
|Bre93l Chap. IV, §8]. For instance, the cellular embedding of A into the 
torus T of Example [5] induces one CW-complex decomposition of the 
torus consists of one 0-cell (induced by the unique state of A), two 1-cells 
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(induced by the two transitions of A) and one 2-cell (thought of as the 
complement of A in T) . 

Recall that the genus of a closed oriented surface S is the integer 
g = \ dimi7i(Z'; M). In our context, it is useful to note that the genus of 
E is the maximal number of disjoint cycles that can be removed from E 
such that the complement remains connected. 

Definition 1. A cellular embedding of k into E is minimal if the genus of 
E is minimal among all possible surfaces E into which A embeds cellularly. 

Example 6. The second embedding of Example [5] is cellular: the com- 
plement of A consists in one open 2-cell. It is not minimal. Indeed, the 
automaton embeds into the 2-sphere S"^: it is realized as the wedge of two 
circles QO (whose complement in S"^ consists of three open 2-cells) . 

Definition 2. The genus g{k) of a finite deterministic automaton A is 
the genus of E where E is a closed oriented surface into which A embeds 
minimally. 

Example 7. The genus of the automaton that consists in one state and 
an arbitrary number of loops is zero because it embeds into the 2-sphere. 

Let (7a be the smallest number gs where E \s a, closed oriented 
surface into which A can be embedded. Then g^ < g{A) (since all possible 
embeddings, included noncellular ones, are considered). 

Theorem 4 (J.W.T. Youngs |You63] ) . For any automaton A, gi^ = 
g{k). In other words, an embedding with minimal genus is cellular. 

We shall use Youngs' result throughout this paper. 

Example 8. Consider the example of the graph K^, the complete graph 
on five vertices. It is well known that K5 is not planar. Embed it into the 
torus T as depicted in Fig. [l} Since the torus has genus 1, the embedding 
is minimal. One verifies that it is also cellular: the complement of in 
T consists of five disjoint open 2-cells. 

We can now formally state the definition of the genus of a regular 
language. 

Definition 3. Let L be a regular language. The genus g{L) of L is the 
minimal genus of a complete finite deterministic automaton recognizing L : 

g{L) = mm{g{A) \ L = Li^, A complete finite deterministic}. 
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Fig. 1. A cellular embedding of the graph Kc, in the torus T. 



There is a simple upper bound for the genus of a deterministic au- 
tomaton. 

Proposition 2. Let k be a deterministic automaton with m letters and 
n states. Then 

g{k) < mn. 

Proof. This follows from Euler's formula ([T]). I 

Given some fixed alphabet, Prop. [2] shows that the genus of a regular 
language L is smaller than the size of a minimal automaton recognizing L 
up to some linear factor. Hence the main problem we face is to compute 
a lower bound for the genus (see Th. [7]) . 

The next two results deal with the completeness of automata and 
reachable states. They are instrumental in nature: they say that the com- 
pletion of a automaton of minimal genus and the suppression of its un- 
reachable states do not modify the genus. These facts will be used in the 
sequel without further notice. 

Proposition 3. For any regular language L with genus g, there is a com- 
plete, deterministic automaton of genus g representing L. 

Proof. Let L be a regular language with genus g. Then, there is a de- 
terministic automaton A = {Q,A,qQ,F,6) representing L that embeds 
cellularly in a surface U of genus g{A) = g. First, to any state g of A 
which would not be complete, add a new trash state _Lg with the transi- 
tions 5{q, a) = _Lq for all letter a such that 5{q, a) is not defined. Second, 
to each of these new trash states, add loops 5{-Lq, a) = _Lq for all a G A. 
Clearly, the new transitions embed into U and do not modify the genus 
of A. ■ 
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A state g of a (deterministic) automaton A = (Q, A, qo, F, 5) is said to 
be reachable if there is a word w such that 5{qo,w) = q. 

Proposition 4. For any regular language L with genus g, there is a de- 
terministic, complete automaton A of genus g representing L such that all 
states of A are reachable. 

Proof. Consider an automaton A of genus g representing L. Remove ah 
unreachable states and the corresponding transitions from A. The lan- 
guage recognized by the modified automaton is still L. Being a subgraph 
of A, the new automaton has a genus smaller or equal to g. Since it rep- 
resents the same language L, its genus must be equal to g. All its states 
are reachable. I 



2.1 Combinatorial cycles and faces 

In this paragraph, we introduce cycles and faces. A cycle is a notion that 
depends only on the abstract graph, while a face depends on a cellular em- 
bedding of the graph. The notion of faces is crucial in the Genus Formula 
(Theorem [sj and instrumental in the Genus Growth Theorem (Theorem 

Definition 4. Let p > 1. A walk in A is a finite alternating sequence of 
vertices (states) and edges so,ti, si,t2 ■ ■ ■ ,tp, Sp of A such that for each 
j = l,...,p, the states Sj-i and Sj are the endpoints of the edge tj. 
The length of the walk is the number of edges (counting repetitions). An 
internal vertex of the walk is any vertex in the walk, distinct from the first 
vertex sq and the last vertex Sp. The walk is closed if the first vertex is 
the last vertex, sq = Sp. 

Recall that we regard A as an unoriented graph: one can walk along 
an edge opposite to the original orientation of the transition. The edge 
should be nonempty: there should be an actual transition in one direction 
or the other. In particular, if there is no transition from a state s to itself, 
then the vertex s cannot be repeated in the sequence defining a walk. 

If the underlying graph is simple, then we suppress the notation of the 
edges: a walk is represented by a sequence of vertices sq, si, . . . ,Sp such 
that any two consecutive vertices are adjacent. 

Definition 5. Consider the set W{p) of closed walks of length p in A. The 
group of cyclic permutations of {1, ... ,p} acts on W{p). A combinatorial 
cycle of length p, or simply a p-cycle, is an orbit of a closed walk of 
length p. 
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In other words, two closed walks represent the same combinatorial 
cycle if there is a cyclic permutation that sends one onto the other. This 
definition is propped by the fact that we are interested in geometric cycles 
only and we do not want to count them with multiplicities with respect 
with the start of each node. 

Remark 1. Our definition of a cycle departs from the traditional one in 
graph theory: repetitions of edges and internal vertices may occur. A 
combinatorial cycle in which no edge occurs more than one will be called 
a simple cycle. (We still allow repetition of an internal vertex that has 
several loops.) 

We denote the set of all p-cycles in A by Zp{k). Since A is finite, Zp{k) 
is finite. We set Zp = \Zp{k)\. 

The definitions of walks and cycles are intrinsic to the graph: they do 
not depend on an embedding (or a geometric realization) of the graph. 
However, they are directly related to topology once an embedding is given. 
Let A be an automaton embedded in a surface U. Each combinatorial cycle 
determines a geometric 1-cyclc (in the sense of singular homology) in E. 
Therefore combinatorial loops are thought of as combinatorial analogues 
of singular 1-cycles (in the sense of singular homology) . 

In what follows, consider a cellular embedding of A into a closed ori- 
ented surface E. By definition, the set 7ro(i7 — A) of connected components 
oi S — k consists of a finite number of 2-cells. The image in S of the set 
A-*^ of edges of A is the 1-skeleton of E. With a slight abuse of nota- 
tion, we shall denote by the same symbol E^ the collection of embedded 
edges of A. Consider an edge e ^ E^ and an open 2-cell c G tto{E — k). 
It follows from definitions that if Int(e) and Pr(c) intersect nontrivially 
then e C Pr(c). Since E is a 2-manifold, there is at most one component 
c' of 17 — A, c' 7^ c, such that e C Pr(c). 

Without loss of generality, we may assume that the embedded edge 
e is a smooth arc. Let x be a point in e. Define a small nonzero normal 
vector TZ" at X. If it and —it point to distinct components c, c' of i7 — A, 
then e C Pr(c) fl Fr(c'): there are two distinct components separated by 
e. In this case we say that e is bifacial. If it and —it point to the same 
component c of 17 — A, then c is the unique component of Z" — A such that 
e C Fr(c). In this case, one says that the edge e is monofacial. 

We define a pairing (-, -) : E^ x tto{E - A) ^ {0, 1, 2} by 
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(e,c) 



' if e n Fr(c) = 

1 if e C Fr(c) and e C Fr(c') for c' G 7ro(Z' - A) - {c} 

2 if c is the unique component of — A such that e C Fr(c). 



From the discussion above, it fohows that for any edge e ^ S^, 

E (e,c) = 2. (1) 

cG7ro(i:-A) 

Definition 6. Let k > 1. A component c of E — A is a A;- face if 

T/ie set of k-faces is denoted F^. We let f^ denote the number of elements 
in Fk- A face is a k-face for some k > 1. The set of faces is denoted F. 

The fohowing properties hold: 

(1) The sets Fk,k> 1 are disjoint; 

(2) Ah sets Fk but finitely many are empty; 

(3) F= [jFk = M^-k). 

k>l 

Definition 7. Let k>l. A combinatorial A;-gon in X! is a k-cycle of k 
that hounds a k-face of U — A. A 2-gon will also be called a bigon. A cycle 
of length 1 will be called a loop. 

Lemma 2. Any 1-gon has a bifacial edge. 

The proof follows from the more general fact that a contractible simple 
closed curve is separating. See §8.1| for a proof. 

The automaton depicted opposite has two states; 
each state has three outgoing transitions. It is cellularly 
embedded into the plane. All edges are bifacial. The edge 
a is a loop contractible in U but is not a 1-gon; q/3 is a 
cycle of length 2 that is a bigon; 'y5 is a cycle of length 
2, contractible in U, that is not a bigon. 

According to Lemma [2| a cycle of length 1 is monofacial if and only 
if it is not a 1-gon (if and only if it represents a nontrivial element in 
1-homology). A bigon may have monofacial edges, even in a cellular em- 
bedding: the cellular embedding of Example [5] provides such an instance. 
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Remark 2. For any k > 1, fk < Zk- Indeed, just as in homology, a com- 
binatorial cycle does not necessarily bound a combinatorial face. For in- 
stance, in the embedded in the torus as in Example [8j the simple cycle 
BCDB of length 3 is not a 3-gon. In fact, it does not bound any 2-cell. 

Lemma 3 (1-gon lemma). There exists a minimal embedding of A such 
that any cycle of length one is a 1-gon and in particular, is bifacial. 

See ^ for a proof. In the sequel, we shall frequently use Lemma |3] 
without further notice. 

By contrast, there exists automata for which there is no em- 
bedding such that every cycle of length two is a bigon. A simple 
example can be constructed using the subgraph opposite. 

Our definition of a combinatorial cycle mimics that of a geometric 
1-cycle c in the sense of a singular 1-chain such that dc = 0. We remark 
that in order to represent a singular 1-cycle by a combinatorial cycle, the 
combinatorial cycle in question may have repetitions of edges and internal 
vertices. 

For instance, consider anew the cel- 
lular embedding of Example [Sj As men- 
tioned, the complement of in T has 
five components which are open 2-cells. 
Four of them are 3-faces bounded respec- 
tively by ABCA, ACDA, ADEA and 
AEBA. The fifth component is an open 2-cell: removing the other four 
open 2-cells and the edges BD and CE yields a 2-cell. (Removing the 
four open 2-cells yields a punctured torus, which is a regular neighbor- 
hood of the wedge of a meridian and a longitude; removing the edges BD 
and CE amounts to cutting transversally the meridian and the longitude 
respectively, yielding a topological 2-cell.) 

This fifth 2-cell is a bit more complicated 
to describe: it is not bounded by any sim- 
ple cycle. It is bounded by the closed walk 
BCEBDECDB which represents a combi- 
natorial cycle of length 8. It therefore rep- 
resents an 8-gon. Note that the monofacial 
edges CE and BD are travelled twice in op- 
posite orientations. 
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2.2 Digression: face embeddings and strong face embeddings 

This paragraph is not necessary to understand our results and their proofs 
(and hence may be skipped on a first reading). Indeed they do not de- 
pend on the notions introduced here. In particular, they do not depend 
on whether the Strong Embedding Conjecture (or a related conjecture) 
is true or not. It is true however that for a graph that has a strong em- 
bedding in a surface of minimal genus, then the Genus Formula has a 
particularly simple form. (However, it is known in general that this needs 
not always be the case. There exist 2-connected graphs of genus 1 that 
have no strong cellular embedding in a torus, see |Xuo77] .) We include 
this paragraph for clarification. 

As we have seen, fc-gons that appear in cellular embeddings need not 
be simple. We may request them to be, at the expense of a more restrictive 
definition. 

Definition 8. A face embedding of a graph A into a closed oriented sur- 
face U is a cellular embedding of A into E such that each k-face in E — k 
is bounded by a simple k-gon. 

Opposite is depicted another embedding of 
into the torus (the opposite sides in the square are 
identified as usual) . This embedding is a face embed- 
ding of K^: the complement of consists of five 4- 
faces. Hence this embedding is not equivalent to the 
cellular embedding of Example [8j 

Note that a face embedding does not rule out the possibility that an 
edge be monofacial. Recall that e is bifacial if there is a component c in 
U — A such that (e, c) = 1. 

Definition 9. Let e be an edge of embedded graph A into a closed oriented 
surface S. A strong face embedding of a graph A into a closed oriented 
surface U is a face embedding of A into E such that every edge is bifacial. 

For instance, in Example [Sj all but the edges BD and CE are bifacial. 
The face embedding of above is strong. The second embedding of 
Example [5] is a face embedding that is not strong. 

This definition is related to that of strong cellular embedding: a strong 
cellular embedding of A is an embedding in U such that the closure of 
each connected component i7 — A is a closed 2-cell. Equivalently, every 
fc-face of 17 — A is bounded by a true simple cycle without repetition of an 
internal vertex. A strong cellular embedding is a strong face embedding. 
The converse does not hold in general. 
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The strong cellular embedding conjecture jJae85| is that every 2- 
connected graph has a strong cellular embedding into some closed surface 
(orientable or not). Even though the question is theoretically simpler, we 
do not know whether every 2-connected graph has a strong face embed- 
ding into some closed surface (orientable or not). 

3 Genus Formula 

Our first main result is a closed formula for the genus of a regular lan- 
guage. 

Theorem 5 (Genus formula). Let A be a deterministic automaton with 
m letters. Then for any cellular embedding of A, 

(A)<1 ^ ^ ^ j ^ ~ ^ y _|_ "^"^ ~ ^ J _|_ (2) 

~ Am 2m Am Am Am 

with equality if and only if the embedding is minimal. 

The faces /i, /2, • • • are determined by the cellular embedding of A. It 
follows from §2.1| that for each cellular embedding, there is some M > 
such that fk = for all k > M. In particular, the sum YlT=i ^^"'^m~^"' /fe 
that appears on the right hand side of ^ is finite. 

Remark 3. In the case when Q is an equality, it is not claimed that the 
embedding is unique. Thus inequivalent minimal embeddings for A lead 
to distinct formulas for the genus of A. 

Remark 4- In the case when Q is an equality, it is not claimed that 
the automaton A is the minimal state automaton (in the sense of Myhill- 
Nerode). Indeed, the automaton with the least number of states does not 
have necessarily minimal genus (see below 

The Genus Formula is proved in ^ 

The following corollary is a straightforward consequence. 

Corollary 1. Let L be a regular language on m letters. For any minimal 
embedding of a deterministic automaton A representing L, 



(3) 
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4 Genus growth 

Let us begin with a simple example. Any language on a 1-letter alpha- 
bet is represented by a planar deterministic automaton. Indeed, these 
deterministic automata have a quasi- loop (planar) shape: 




a 



Actually, there is an indirect proof of the result. For a unary alphabet, 
ei < eg. Since there is at least one face, Euler's relation states that 
1 < 2 — 2g, that is g < 1/2. And thus g = 0. The remark shows that 
to get graphs of higher genus, one should augment the number of edges. 
Then the relation ei = meo forces to augment the size of alphabets. Thus 
a general study of the genus of automata depends on the size of alphabet. 
Consider now the case of a 2-letter alphabet. 

Proposition 5. Let (A„)„gj^x be a sequence of deterministic finite au- 
tomata of size n on the same alphabet. Assume that the number m of 
letters is two. For any cellular embedding of An, 

• either there exists M > such that sup [fi{n) + f2{n) + /3(n))< M 

n>l 

• or ^liin^ Y.k>5 ^fk{n) = +oo. 

Proof. Suppose neither condition is satisfied. Then 

/i(n) — )• -|-oo or f2{n) — )• +oo or fzin) +oo 

n^+oo n— >+oo n—^+oo 

(since these are sequences of nonnegative integers) and Ylik>5 fk{n) re- 
mains bounded. For m = 2, the second, third and fourth terms respec- 
tively in the genus inequality ^ are negative or zero. The fifth term is 
always zero for m = 2. It follows easily that g{n) is negative for n large 
enough, which is a contradiction. ■ 

Corollary 2. Let {Ln)n& be a sequence of regular languages on two let- 
ters. If for each n, Ln is recognized by a deterministic automaton A„ 
of size n having a cellular embedding such that X]fc>5 fk{i^) remains 
hounded as n ^ +oo, then the genus g{Ln) of Ln also remains bounded. 

Proof. By hypothesis, the number fk{n) of k faces in a cellular embedding 
of An into a surface En verify ^fc>5 ^^fk{n) < +oo. By Prop, [sj /i(n), 
/2(n) and /3(n) are bounded. The genus formula then shows that g(^An) 
remains bounded. Since g{Ln) < g{An), the conclusion follows. ■ 
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Each An in general has several nonequivalent cellular embeddings. But 
for any cellular embedding, the alternative of Prop.[5]holds for the various 
numbers of faces fk{n) (determined by the embedding). Corollary [2] states 
a sufficient condition for the genus of a language to be bounded. The 
interest in this result lies in the fact that it discriminates between the 
respective contribution of the faces to the genus. 

There is a similar result when the number of letters is three. 

Proposition 6. Let (A„)„gj^x be a sequence of deterministic finite au- 
tomata of size n on the same alphabet. Assume that the number m of 
letters is three. For any minimal embedding of k, 

• either there exists M > such that sup + /2(n))< M 

n>l 

• or lim X]fc>4 '^fk{n) = +oo. 

Corollary 3. Let {Ln)n&, be a sequence of regular languages on two let- 
ters. If for each n, Ln is recognized by a deterministic automaton A„ 
of size n having a cellular embedding such that Ylk>4 fk{n) remains 
bounded as n ^ +oo, then the genus g{Ln) of Ln also remains bounded. 

The proofs of Prop. [6] and Cor. [3] are similar to those of Proposition 
[5] and Corollary [2] 

We state our main result on the genus growth of automata and lan- 
guages. 

Theorem 6 (Genus Growth). Let (A„)„gj^x be a sequence of deter- 
ministic finite automata with m letters and n > 1 states. Let g{n) be the 
genus of An. Assume 

(1) m > 4. 

(2) The numbers Zk{n) of cycles of length 1 and 2 in An are negligible with 
respect to the size n of An: lim ili^i = Hm ^^^"^ = 0. 

n— s>+oo " n— s>+oo " 

Then for any e > 0, there exists N > such that for all n> N , 

g{n) > 1 + I e ) mn. 

\ 6m J 



The Genus Growth Theorem is proved in { 10 

We begin with examples showing that we cannot easily dispense with 
the hypotheses of Theorem [6} 
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Example 9 (Quasi-loop automaton). Any quasi-loop finite deterministic 
automaton with alpliabet of cardinality m < 1 is planar. This of course 
does not contradict the Genus Growth Theorem because Hypothesis (1) 
does not hold. 

Example 10 (Genus 1 automaton). Let n > 3. Define an automaton A„ 
as follows. Consider the set 5„ = | < i, j < n} inside the square 

C = [0,n] X [0,n]. The quotient T of C under the identifications (0,t) = 
(1, t) and (t, 0) = (t, 1) is a torus. The image Sn of Sn in T is the set of 
states. Note that there are exactly v? states. For each S (Z/nZ)^, 
define two outgoing transitions + (mod n) and (i, j) — )■ 

(i, J + 1) (mod n). Choosing arbitrary initial and final states yields a finite 
deterministic complete automaton A„ with n? states. Clearly g{kn) < 1 
for any n > 3. (This also follows from Cor. [2j) This does not contradict 
the Genus Growth Theorem because the alphabet has only two letters. 
The same example with an extra outgoing transition (i, j) — s- (z + 1, j + 1) 
(i.e., with an extra letter for the alphabet) still yields an automaton B„ 
with g{Bn) < 1 for any n > 3. (This also follows from Cor. [sj) This still 
does not contradict the Genus Growth Theorem: the alphabet has only 
three letters. 

Example 11 (Another genus 1 automaton). Start with the previous ex- 
ample B„. To each state (i,j), add an outgoing transition pointing to 
(i,j). This yields a deterministic complete automaton with ti? states and 
an alphabet that consists now of 4 letters. There is an obvious cellular 
(minimal) embedding in the torus T as before. This does not contradict 
the Genus Growth Theorem because now the number of cycles of length 
one (loops) is (the number of states), so the second hypothesis is not 
satisfied. 




A natural consequence of the Genus Growth Theorem for automata 
is an estimation of the genus of regular languages. 
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Theorem 7 (Genus Growth of Languages). Let {Ln)neN be a se- 
quence of regular languages on m letters, m > 4. Suppose that for each 
n large enough, any automaton recognizing Ln has at least n states and 
that the number of its cycles of length 1 and 2 are negligible with respect 
to n. Then for any e > 0, there is N > such that for all n > N, 

f m — 3 \ 

1 + e mn < g{Ln) < rnn 

\ 6m / 

Proof. The upper bound for the genus foUows from Prop. [2j The lower 
bound is a dhect consequence of the Genus Growth Theorem. ■ 

In particular, under the hypothesis of Theorem [7| the genus g{Ln) 
grows linearly in the size n of the minimal automaton representing Ln- 

We take up the question of explicitly constructing such a sequence 
of regular languages in ^ There we detail an explicit construction, that 
shows that there is a hierarchy of regular languages based on the genus 
(Th.[9]). 

Another application of the Genus Growth Theorem is the estimation 
of the genus of product automata. 



4.1 The genus of product automata 

It is well know that the size of the product automaton corresponding to 
the union of two deterministic automata A and B is bounded by m x n, 
the product of the size of A and the size of B. This bound is actually a 
lower bound as presented by S. Yu in |YuOOj . By Prop. [2| up to a linear 
factor due to the size of the alphabet, m x n is also an upper bound on the 
genus of the product automaton. We prove that it is also a lower bound. 

Corollary 4. There is a family (A^, B„)m,eN,neN of planar automata km 
and B„, of respective size m and n such that the deterministic minimal 
automata kn U B^ has genus 0{m x n). 

Proof Let km be the m-state automaton defined as follows. 

b b b 




a, c, d 

Let B„ be the n-state automaton defined as follows. 
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The minimal automaton Am U B„ has size m x n and it contains neither 
loops, nor bigons. Thus, Theorem [6] applies and leads to the conclusion. 



4.2 The exponential genus growth of determinization 

We prove that determinization leads to an exponential growth, as this is 
the case for state-complexity (see for instance [GMRY12] ) . Consider the 
following family of automata (A„)„gpjx . The alphabet £/n = {^^i, • • • , Xn} 
is a set of cardinality n. The states of A„ consist of one initial state sq, n 
states (one state for each letter) si, . . . , s„ and one trash state. All states 
except the initial state and the trash state are final. The transitions of A„ 
are defined as follows: 

• From the initial state sq to each state Sj {1 < i < n), there are n — 1 
transitions whose labels lie in £/n — {xi}. 

• From each state Sj (1 < i < n) to itself, there are n — 1 transitions 
whose labels lie in s/n — {xi}- 

• From each state Sj (1 < i < n) to the trash state, there is one transi- 
tion whose label is Xj. (One can add n loops with labels xi, . . . , x„ to 
the trash state so that the resulting automaton is complete.) 

If follows from the definition that the language recognized by A„ is 
the set of words containing at most n — 1 distinct letters. It is also clear 
from the definition that for any n > 2, A„ is planar and nondeterministic. 
(Note that the fact that we include or not the trash state with or with 
its loops is irrelevant.) 

Theorem 8. The determinization of An is minimal and has genus 

<7n>l+g-l) 2--\ 

For instance, 54 > 1 so the determinization A'^^^ of A4 is not planar. 
This can be seen by Kuratowski's theorem (as it is can be seen A4'^* 
contains the utility graph K^^s). It is not hard to embed Ag'^* into a plane 
so 53 = 0. Of course the meaning of the theorem is that the genus of Aj^*^* 
grows at least exponentially in n. 
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abed 



Fig. 2. The automaton A4 and its determinized form. 

Proof. Let us describe an isomorphic variant AjJ^* of the determinized form 
of A by the powerset method. The states of A^''* consist of ah subsets of 
Sn- The initial state of A'^^^ is ^ itself. The trash state is the empty set. 
Any state but the trash state and the initial state is a final state. 

Therefore, the number eo of states of A^*^* is 2". The transitions are 
described as follows. For each letter x S there is one transition from 
S to the state S — {x}. Minimality follows from definitions: there are no 
indistinguishable states. 

Let us consider the number e" of transitions of A'^'^^ that are loops. 
By definition, each state labelled by a subset of cardinality k contributes 
exactly n — k loops. We conclude that 

^? = E(I)(--^) = E(I)^ = -2'^-^- (4) 

fc=0 ^ ^ k=0 ^ ^ 

It follows that exactly half of the transitions are loops: 

ei = 2e?. (5) 

Consider now a minimal embedding of A^*'* into a closed oriented sur- 
face Z". Consider one loop / in Z"^. Since it is bifacial, it is the intersection 
of exactly two distinct adjacent closed 2-cells. Therefore removing the 
loop (while keeping the state) amounts to merging two 2-cells into one 
2-cell. The union of states and transitions (minus V) still induces a CW- 
complex decomposition of 17. Therefore, according to Euler's relation, the 
genus of is unaffected. We can therefore remove all loops from Yl^ . Thus 
we can assume that e\ = e° (from ([s])) and /i = 0. 
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Lemma 4. For the new graph minimally embedded in U, the following 
properties hold: 

• /2 = 0; 

• For any k > I, f2k+i = 0; 

• For any k > n, f2k = 0. 

Proof. These observations are consequences of the particular structure of 
the original graph A^*^*: they follow from the definition of A'^'^^ and are 
left to the reader. ■ 

We return to the proof of Theorem [8} We have 

2ei = /i + 2/2 + 3/3 + 4/4 • • • = 4/4 + 6/6 + • • • + (2n - 2)f2n-2. 

The first equality is relation ([oj) and the second equality follows from 
Lemma |4j Since all numbers are nonnegative numbers, we have 

2ei > 4(/4 + /e + • • • + /2n-2) = 4 es- 

Thus 62 < ^ei. From Euler's relation, we deduce that 

1 1 

2^ = 2 - Co + ei - 62 > 2 - eo + ei - -ei = 2 - Co + -ei. 

Substituting values for eo and ei, we obtain 

25>2 + (J-l) 2". 
This is the desired result. ■ 

5 State-minimal automata versus genus-minimal 
automata 

Minimal automata-as given by Myhill-Nerode Theorem-have the remark- 
able properties to be unique up to isomorphism, leading to a fruitful re- 
lation between rational languages and automata. In this section, we show 
that state- minimality is a notion orthogonal to genus- minimality. First, 
consider the following proposition: 



Proposition 7. There are deterministic automata with a genus strictly 
lower than the genus of their corresponding minimal automaton. 
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Proof. Let K5, K' be the automata: 




Clearly, K5 and K' represent the same language C, K5 is minimal, K5 
has genus 1 and K' is planar. 

Example 12. Minimal automata have not necessarily the maximal genus. 
For instance, the following deterministic automaton 




has genus 1, but its minimal corresponding automaton 




has genus 0. 

Contrarily to state minimization, there is no isomorphism between 
genus-minimal automata, even restricted to minimal state size. Indeed, 
the automaton K" below on the right represents £, but it is not isomorphic 
to K' on the left (the vertex 4 of automaton K' has 6 adjacent edges, the 
automaton K" has no vertex with 6 adjacent edges): 
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To sum up, the two automata K' and K" (a) represent the same lan- 
guage, (b) have equal and minimal genus, (c) have minimal size given 
that genus, (d) have non isomorphic underlying graphs. 

6 A hierarchy of regular languages 

Is there always a planar deterministic representation of a regular lan- 
guage? For finite languages, the answer is positive. Indeed, finite lan- 
guages are represented by trees (which are planar). In general, as evi- 
denced by the Genus Growth Theorem for regular languages, the answer 
is a big "no" . This section is devoted to an explicit constructive proof. 

Theorem 9 (Genus-Based Hierarchy). There are regular languages 
of arbitrarily large genus. 

Proof. Consider the alphabet A = {a,b,c,d}. For all n > 0, let Un be 



the intersection of the languages /2(A„) and /2(B„) (see ^4.1 for the defini- 
tion). Consider an arbitrary deterministic automaton C„ = {Q, A, s, F, 5) 
representing 

First, as justified by Proposition [ij we can suppose without loss of 
generality that any state of Cn is reachable. 

Second, the automaton C„ is necessarily complete. Indeed, for any 
word w, there is an extension of w which belongs to the language (since 
the final state can be reached), so that any word must be read completely. 

The automaton C„ has the following key properties: 

(i) C„ has at least states, 

(ii) Cn has no loops, 

(iii) Cn has no bigons. 

Since the size of the alphabet ^ is 4, by the genus formula ^ (Th.[5]), 
we conclude from (i-iv) that g{Cn) > O(n^) and the result follows. So, it 
remains to prove the four properties above. 

Let us consider the function a : A — )• N x N with a (a) = (1,0), a{b) = 
(0, 1), a(c) = (1, 1) and a{d) = (—1, 1). The function a extends to words 
in A* by means of the equations a(e) = (0, 0) and a{e ■ w) = a(e) + a{w) 
mod (n, n) ioi e ^ A, w ^ A* and {x,y) + {x' ,y') mod (t, u) = {x + x' 
mod u,y + y' mod v). A crucial observation is that 

weUn a(u;) = (0,0). (6) 



Remark that a induces a function-again denoted a-on states defined 
as follows. For the initial state s, let a{s) = (0,0). Consider now a state 
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q ^ s. Since it is reachable, there is some word w such that 6{s, w) = q. 
Define a{q) = a{w). Note that the function is well-defined, that is, the 
definition does not depend on w. Indeed, suppose the existence of a word 
w' such that 5{s,w') = q and a{w') ^ a{w). Let q(w) = Consider 
V = a"'"* • 6""-'. Then, a{w • v) = (0,0) / a{'w' ■ v). And thus, f is a 
distinguishing extension of w and w' . This implies 6{s,'w) ^ 6{s,w') in 
contradiction with the hypothesis. 

We come now to the three properties. 

(i) Let < i < n — 1 and < j < n—1. Consider the word Vij = a^-V . 
Then, a{vi^j) = Thus, there are at least nx n states. 

(ii) Suppose that at state q, there is a loop labelled by a letter e G A. 
Since q is reachable, there is some word w such that 5{s,w) = q. Since 
6{s,w-e) = 5{s,w), we can state that a{w-e) = a(t(;), which itself implies 
a{e) = (0,0). But there is no such letter e in A. 

(iii) Similarly to (ii), since there are no two letters e and e' such that 
a(e) = a(e'), there are no bigons of the shape: (^^^^^^^G)' ^'^^'^^ there are 
no two letters e and e' such that a(e) + a(e') = (0, 0), there are no bigons 
of the shape: (3^^-0' '^'^^^j there are no bigons. ■ 



7 Nondeterministic planar representation 

The genus of a regular language L was defined in ^ as the minimal genus 
of a deterministic automaton recognizing L. In this section, we point out 
that that the word "deterministic" is crucial in the previous sentence. 
The following result is essentially proved by R.V. Book and A.K. Chandra 
|BoCh76l Th. la & lb]. (See also (BP99].) 

Theorem 10 (Planar Nondeterministic Representation). For any 

regular language L, there exists a planar nondeterministic automaton A 
recognizing L. 

Proof. We include two proofs for the convenience of the reader. Both 
follow closely |BoCh76] with minor modifications. Let L = L{R) be a 
regular language given by a regular expression R. We shall show that 
L = L{k) for some planar nondeterministic automaton A. 

The proof follows the recursive definition of a regular expression. An 
expression that is not the empty string is regular if and only if it is con- 
structed from a finite alphabet using the operations of union, concate- 
nation and Kleene's "'"-operation. Consider the class C of planar finite 
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nondeterministic automata that have exactly one initial state, exactly one 
final state such that the initial state and the final state are distinct. 

Clearly, C contains an automaton that recognizes the regular expres- 
sions R = (take A to be the automaton with two states, one initial, one 
final and no transition) and R = a G A (take the automaton with two 
states, one initial, one final and one a-labelled transition from the initial 
state to the final state). 

Next, we show that the class C is closed under the three operations 
mentioned above. Suppose given two subexpressions R and S recognized 
by A and B in C respectively. 

Consider union: first we construct an automaton A+B with e-transitions 
that recognizes R + S. 




Define an e-removal operation as follows. Consider an e-transition that 
goes from state qi to state q2- (We assume that qi ^ q2-) We suppress the 
£-transition and merge the two states qi and q2 into one state q. Ascribe 
all incoming and outgoing transitions at qi and q2 respectively, to the 
new state q. The e-removal is best visualized by pulling the state q2 back 
to the state qi, or by pushing the state qi forward to the state q2 before 
actually merging therrj^ 




* Note that the result of the e-removal operation does not depend on the orientation 
of the transitions. 



28 



We apply this operation four times (in any order) to the automaton 
above. Clearly the result is an automaton that remains in C. 

Consider concatenation: the following planar automaton with one e- 
transition recognizes the expression R ■ S. 



o * ■:> 


£ 







Next we remove the e-transition by the e-removal operation. This 

provides us with the desired automaton in C. 

Finally consider Kleene's operation: suppose that the automaton A 
recognizes the expression R. The following planar automaton with three 
£-transitions recognizes = Uk>iR''. 




We remove the e-transitions as before. This leaves us with the desired 
automaton in C. This finishes the first proof. 

The second proof is short but clever. Define A„ be the following deter- 
ministic finite automaton with set of states [n] = {1, . . . , n} and alphabet 

= {c^ij I 1 < < 1^}- For 1 < i,j < n, set a transition with symbol 
aij from i to j. We take 1 to be the initial state and 2 to be the final 
state. 

Claim 1. The automaton A„ has the universal property that any non- 
deterministic n-state automaton A = {[n\,A,l,6,2) can be recovered (up 
to equivalence) by "parallelization" of the transitions of kn- 

Proof of the claim. Build an n-state automaton Cn by replacing each 

transition (qj) — ^ — ii^ A„, by Tij = {a e A \ j e S{i,a)}. (If Tij is 
empty, then we remove the transition a^j. Otherwise, we have \Tij\ distinct 
"parallel" transitions from i to j.) The automaton C„ is equivalent to A. 
■ 

Claim 2. If An has an equivalent planar automaton then any nonde- 
terministic n-state automaton A = {[n\,A,l,6,2) has an equivalent planar 
automaton. 
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Proof of the claim. We process the same proof as above with B„ instead 
of An, observing that parallelization preserves planarity. ■ 

It remains to construct a planar automaton B„ equivalent to A„. The 
construction goes by induction on n. For n = 3. as a graph, A3 is the 
complete graph on three vertices, hence is planar. So wc take B3 = A3. 
Suppose we have constructed a planar automaton Bjj, equivalent to A^, 
together with an embedding of B„ into and a surjective map a : Qn ^ 
[n] from the set of states of B„ to the set of states of A„. We have to 
construct B„+i. Consider B„ C M^. For any pair of distinct states q,q' of 
Bji, merge the transitions from q to q' and from q' to q into one unoriented 
edge. (If there is no transition, we do not perform any operation.) Finally 
we remove loops at each state. We obtain in this fashion an undirected 
simple graph G„ whose vertices are exactly the states of B„. For each face 
/ of — Gn, place one vertex v inside / except for the exterior face (the 
unbounded component of — G„), and connect it to all vertices of the 
face / and itself. We obtain a new graph Gn+i- See the figure below for 
the recursive constructive of 0^,04 and G5. 




We extend a by setting a{v) = n+1. We restore all previous (oriented) 
transitions between any pair of vertices, we label the new loop at v by 
the symbol an+i,n+i and we unfold each newly created edge from v to 
any other (old) vertex w into two transitions with opposite orientations 
with symbols o'n-{-i^a{w) 

and respectively. 

^a{v),a{w) 

— *^ "0 

This yields a new automaton B„+i. It is clear that the recursive step 
does not affect the initial state and the final state of B„_|_i (that were 
already constructed together with B3). The automaton B„_|_i is planar 
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since Gn+i is planar and the unfolding of the edges preserves planarity. 
It remains to see that B„_|_i is equivalent to A„_|_i. It follows from the 
definition of B„ that for q,q' G Qn and aij £ An, there is a (jjj-transition 
from q to q' if and only if a{q) = i and a(q') = j. It follows that every 
word recognized by B„ is also recognized by A„. To prove the converse, 
one shows that for any sequence xi = 1, X2, • • • , = 2 in [n] (which is a 
word in the language recognized by A„), there is a pathj^ yi, . . . ,yk in B„ 
such that a{yj) = j for each 1 < j < k. This is proved by induction on 
k < mhy using the facts that it is true for m = 3 and that G„ contains 
isomorphic copies of Gn~i- ■ 



8 Proof of the 1-gon Lemma 
8.1 Proof of Lemma [2] 

Geometrically, a bifacial embedded loop is nothing else than a separating 
simple closed curve with a basepoint. It suffices to prove that a con- 
tractible simple closed curve is separating. Consider an embedded loop 
a in based at q G S^. Assume that a is monofacial (nonseparat- 
ing) . Consider a small segment / transversal (say, normal) to a such that 
Ida = {q}. Since a is monofacial, the endpoints of / lie in the same 
connected component of — A. Hence / extends to a loop f3 such that 
/3n(Z' — A) = /3na = {q}. It follows that the algebraic 1-homology 
intersection [/3] • [a] = ±1. In particular, [a] 7^ in Hi{E). Thus a is not 
contractible. ■ 



8.2 Proof of the 1-gon Lemma 

Consider a state q of k d X! that has at least one noncontractible loop. 
Consider a small enough open disc D va. E centered in q such that the 
following properties hold: 1) DPI (Z" — A) is a disjoint union of open cells; 
2) The intersection L> n A is a wedge of semi-open arcs intersecting in 
their common endpoint g; 3) Each arc a is bifacial: there are exactly two 
adjacent cells c, c' G C = {ci, . . . , Cr] such that a C Fr(c) n Fr(c'). 

Let A be the set of arcs. The orientation of E induces a circular 
ordering ai, ci, 02, C2, . . . , Cr oi A\JD where the arcs and cells alternate 
and such that any two consecutive cells are adjacent. 

We fix now an arc ai and perform successively the following operations 
on the arcs following the circular ordering. If the arc aj does not belong 



^ A path is a walk such that no edge occurs more than once and no internal vertex is 
repeated. 



31 



to a loop (i.e. is part of a transition that is not a loop), we do not do 
anything. Otherwise there is another arc f3 belonging to the same loop. 
If the two arcs are enumerated consecutively in the circular ordering, we 
remove the whole loop inside U and replace it by a small 1-gon i based at 
q such that i — q lies entirely in the open cell Cj. At the end of the process, 
we have replaced all cycles of length 1 by contractible loops, hence by 1- 
gons. This does not change the surface hence it does not affect the genus 
of the embedding. In particular, if the embedding is minimal, the new 
embedding remains minimal (hence cellular), with the desired properties. 
Now by Lemma [2] each 1-gon consists of one bifacial edge. ■ 

9 Proof of the Genus Formula 
9.1 Preliminary results 

Consider a minimal embedding of an automaton A into a closed oriented 
surface E. We let cq denote the number of 0-cells (points, i.e. states), 
ei the number of 1-cells (open transitions) and 62 the number of 2-cells 
(that is, the number of connected components of U — A). The first classical 
result is Euler's formula ( |Eul36j . |Bre931 Chap. IV, §13]) that relates the 
genus to the CW-decomposition of U. In our context, since X! is oriented 
and minimal, the formula takes the following form. 

Lemma 5 (Euler's formula). 

X{U) = 2 - 2g{A) = eo - ei + 62. (7) 

Another useful observation is a consequence of the decomposition 
7ro(i7 - A) = ]Jk>o ^k- Namely, 

e2 = /i + /2 + /3 + ... = 5]/fc. 

fc>0 

The sum above is finite since the total number of 2-cells is finite. In 
particular, there is a maximal index k > such that fk>0 and fi = 
for all / > k. 

We need one more result that relates the number of 1-cells to the 
number of faces. 

Lemma 6. 

2ei = /i + 2/2 + 3/3 + ... = J^fc/fc. (9) 

fc>0 
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Proof. We begin with the relation (jT|: X]cG7ro{i;-A)(^' ~ 2. It follows 
that 

E {e,c)=2\U'\ = 2e,. 
Now use the decomposition of the cells into /c-faces: 710(17 — A) = ]J^>q F^. 

E E («'^)= E E(^'^) = E E E(^'^) 

A:>0 cGFfc 

= E^ 

A:>0 

where we used the relation J2eei:^ c) = ^ for a fc-face c. This completes 
the proof. ■ 

9.2 Proof of Theorem [s] (G enus formula) 

Consider a cellular embedding of A into a closed oriented surface U. Eu- 
ler's formula ([t]) for the genus of U gives gs = 1 — 2^"*"^^ . Since the 
automaton is complete, each state has exactly m outgoing transitions. 
Therefore cq = ei/m. Next use the relations ^ and ^ to express ei and 
62 in terms of the fe-faces. This yields the formula 



Ek(m — 1) — 2m „ 
1;;. 



k=l 

Now g{A) < gx; with equality if and only if the embedding into U is 
minimal. This achieves the proof. ■ 

10 Proof of the Genus Growth Theorem 

It is convenient to introduce the following functions: 

^ ^ ^±^^^l^f,(n) and 5(n) = Y.k Mn). 



We begin with 

Lemma 7. There is a constant a > such that 



A{n) > aB{n). (10) 
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Proof. To prove the claim, we first find a > such that 

(m — l)k — 2m , „ 

^ / >ak for all k>3. 

Am 

It suffices, therefore, to choose a such that 

> a for all A; > 3. 



4m 2k 

This condition is satisfied if we choose 

../m — 1 1 \ m — 3 

mf = = an > a- 

fc>3 V 4m 2k J 12m 

(Note that ao > for m > 4.) This proves the lemma. I 

Lemma 8. 

f(n) 

lim = for j = 1, 2. (11) 

--Bin) J J ^ \ J 



n—^+cx3 



Proof. Since fk ^ z^, we have J-!^ < ^ — for k = 1,2. Thus for any 

" n— >+oo 

positive constants a, b, 



n 

+00. (12) 



afi + 6/2 

Observe that 2 mn = ei(n) = /i(ra) + 2/2(72) + B{n). Hence 

n=^{fi{n) + 2f2{n) + B{n)). 
Zm 



Replacing n in (12) by this expression, with a = l/(2m) and b = 1/m, 
we find that 

^B{n) _ Bin) ^ 



2k /i (") + ^ /2 ("i) /i (^) + 2/2 (n) n^+- 
Then 

/5(n) 5(n)\ Bin) 

\fiin) /2 (n) y /i (n) +2/2 (n) 

as desired. ■ 

Let us come to the proof of the Genus Growth Theorem. Let a > e > 
satisfying the condition of Lemma [7j Lemma [8] ensures there is > 
such that for any n > N, 
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Hence for n > N, 

A{n) > (q - e)B{n) + (^a + ^^^^ + (^2a + ^(n). 

Thus 

^/2(n) > (a - e)(i?(n) + /i(n) + 2/2(n)) 
2m- 

= 2(a — e)ei(n) 
= 2(a — e)mn. 

According to Theorem [sj A{n) - ^^|^/i(n) - 2^/2(71) = 5 (n) - 1. Thus 

g{n) > 1 + 2(a — e)m n. 
This achieves the proof of the theorem. ■ 

11 Conclusion 

The topological tool we employ here, the genus as a complexity mea- 
sure of the language, leads to a viewpoint that seems orthogonal to the 
standard one: it is not compatible with set-theoretic minimization (that 
is, state minimization). However, the genus does behave similarly to the 
state complexity with respect to operations such as determinization and 
union (up to a linear factor); furthermore, there is a hierarchy of regu- 
lar languages based on the genus. This suggests a more systematic study 
of all operations : e.g. concatenation, star-operation, and composition of 
those. We take up this task in a sequel to this paper |BD13] . 
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